Reasoning support for evaluators #42482

nagkumar91 · 2025-08-12T15:40:14Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

Add pyrit and not remove the other one

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_coherence/_coherence.py

…tors/_coherence/_coherence.py Co-authored-by: Ankit Singhal <[email protected]>

Copilot

Pull Request Overview

This PR adds support for reasoning models to evaluators by introducing an is_reasoning_model keyword parameter. When set, this parameter updates the evaluator configuration appropriately for reasoning models, enabling better integration with Azure OpenAI's reasoning capabilities.

Key Changes:

Added is_reasoning_model parameter to all evaluators' constructors
Updated QAEvaluator to propagate this parameter to child evaluators
Added defensive parameter checking in GroundednessEvaluator for backward compatibility
Updated documentation across evaluators to describe the new parameter

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`_similarity/_similarity.py`	Added `is_reasoning_model` parameter and updated docstrings
`_retrieval/_retrieval.py`	Added `is_reasoning_model` parameter support
`_response_completeness/_response_completeness.py`	Added `is_reasoning_model` parameter and improved formatting
`_relevance/_relevance.py`	Added `is_reasoning_model` parameter support
`_qa/_qa.py`	Updated to propagate `is_reasoning_model` to child evaluators
`_groundedness/_groundedness.py`	Added parameter support with backward compatibility checks
`_fluency/_fluency.py`	Added `is_reasoning_model` parameter and updated docstrings
`_base_prompty_eval.py`	Updated to pass `is_reasoning_model` to AsyncPrompty.load
`_base_multi_eval.py`	Minor import formatting improvement
`_coherence/_coherence.py`	Added `is_reasoning_model` parameter and updated docstrings
`CHANGELOG.md`	Documented the new feature and bug fix

_{You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.}

...valuation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_coherence/_coherence.py

…tors/_groundedness/_groundedness.py Co-authored-by: Copilot <[email protected]>

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

…del plumbing

… evaluate_kwargs pop scope fix

… docstring; tests; add AZEVAL_USE_PROMPTFLOW override.

…; improve reasoning error hints; add tests

…ader_order_debug_sample.py

Copilot

Pull Request Overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Copilot · 2025-09-24T15:17:18Z

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_legacy/prompty/_prompty.py


        if is_reasoning_model:
-            parameters = configs.get("model", {}).get("parameters", {})
+            parameters = configs.get("model", {}).get("parameters") or {}


[nitpick] The expression or {} is redundant since dict.get() with a default empty dict already handles the None case. Consider simplifying to parameters = configs.get("model", {}).get("parameters", {})

Suggested change

parameters = configs.get("model", {}).get("parameters") or {}

parameters = configs.get("model", {}).get("parameters", {})

Copilot · 2025-09-24T15:17:19Z

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_legacy/_adapters/_flows.py

+            if is_reasoning_model:
+                # Merge sanitized parameters into the override model dict so PF uses them
+                base_params = _extract_prompty_parameters(source)
+                sanitized = _sanitize_reasoning_parameters(base_params)
+                override_params = dict(model.get("parameters", {}) or {})
+                override_params.update(sanitized)
+                model["parameters"] = override_params
+                kwargs["model"] = model


This reasoning model parameter sanitization code is unreachable because it's placed after the early return on line 69 when is_reasoning_model is True. The logic should be moved before the early return or the condition should be restructured.

Copilot · 2025-09-24T15:17:19Z

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

+        _use_run_submitter_client = cast(Optional[bool], evaluate_kwargs.pop("_use_run_submitter_client", None))
+        _use_pf_client = cast(Optional[bool], evaluate_kwargs.pop("_use_pf_client", None))


The variable name is incorrect - should be kwargs.pop() instead of evaluate_kwargs.pop() since the parameter name is evaluate_kwargs but the function is operating on kwargs that was referenced in the original code.

Copilot · 2025-09-24T15:17:19Z

...valuation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py

-import os, logging
+import os
+import logging
+from inspect import signature


The signature import from inspect is not used in this file and should be removed to avoid unused imports.

Suggested change

from inspect import signature

Copilot · 2025-09-24T15:17:20Z

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_reasoning_model_plumbing.py

+        # Ensure we do not override prompty temperature/max_tokens in model parameters
+        # Only extra_headers should be present in parameters added by code
+        model_cfg = kwargs.get("model")
+        assert isinstance(model_cfg, dict)
+        params = model_cfg.get("parameters", {})
+        # Our code only sets extra_headers inside parameters; temperature/max_tokens come from prompty


[nitpick] The comment is misleading. The test verifies that temperature and max_tokens are NOT in the parameters dict, but the comment suggests they should be preserved from the prompty file. Consider clarifying that these parameters should be removed for reasoning models.

Suggested change

# Ensure we do not override prompty temperature/max_tokens in model parameters

# Only extra_headers should be present in parameters added by code

model_cfg = kwargs.get("model")

assert isinstance(model_cfg, dict)

params = model_cfg.get("parameters", {})

# Our code only sets extra_headers inside parameters; temperature/max_tokens come from prompty

# For reasoning models, temperature and max_tokens should be removed from parameters.

# Only extra_headers should be present in parameters added by code.

model_cfg = kwargs.get("model")

assert isinstance(model_cfg, dict)

params = model_cfg.get("parameters", {})

# Our code only sets extra_headers inside parameters; temperature/max_tokens are intentionally omitted for reasoning models.

Nagkumar Arkalgud and others added 30 commits May 28, 2025 11:11

Prepare evals SDK Release

4318329

Fix bug

192b980

Fix for ADV_CONV for FDP projects

758adb4

Update release date

de09fd1

Merge branch 'main' into main

ef60fe6

Merge branch 'Azure:main' into main

8ca51d0

Merge branch 'Azure:main' into main

98bfc3a

Merge branch 'Azure:main' into main

a5f32e8

Merge branch 'Azure:main' into main

5fd88b6

Merge branch 'Azure:main' into main

51f2b44

Merge branch 'Azure:main' into main

a5be8b5

Merge branch 'Azure:main' into main

75965b7

Merge branch 'Azure:main' into main

d0c5e53

Merge branch 'Azure:main' into main

b790276

Merge branch 'Azure:main' into main

d5ca243

re-add pyrit to matrix

8d62e36

Change grader ids

59a70f2

Merge branch 'Azure:main' into main

4d146d7

Update unit test

f7a4c83

replace all old grader IDs in tests

79e3a40

Merge branch 'main' into main

588cbec

Update platform-matrix.json

7514472

Add pyrit and not remove the other one

Update test to ensure everything is mocked

28b2513

tox/black fixes

8603e0e

Skip that test with issues

895f226

Merge branch 'Azure:main' into main

b4b2daf

update grader ID according to API View feedback

023f07f

Update test

45b5f5d

remove string check for grader ID

1ccb4db

Merge branch 'Azure:main' into main

6fd9aa5

singankit reviewed Aug 12, 2025

View reviewed changes

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_coherence/_coherence.py Outdated Show resolved Hide resolved

nagkumar91 and others added 5 commits August 12, 2025 11:51

Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evalua…

a1e55b4

…tors/_coherence/_coherence.py Co-authored-by: Ankit Singhal <[email protected]>

Update the groundedness based on comments

bd6809f

Add changelog to bug fix and link issue

3ae37cb

Fix docstring

6b8d4ce

lint fixes

733ee1a

nagkumar91 requested review from Copilot and singankit August 12, 2025 22:23

Copilot AI reviewed Aug 12, 2025

View reviewed changes

...valuation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py Outdated Show resolved Hide resolved

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_coherence/_coherence.py Show resolved Hide resolved

nagkumar91 and others added 4 commits August 19, 2025 08:36

Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evalua…

8f39719

…tors/_groundedness/_groundedness.py Co-authored-by: Copilot <[email protected]>

Merge branch 'Azure:main' into main

7b3b889

Support for reasoning models using the right client

3cebd64

Formatting

d23fd3c

singankit reviewed Aug 27, 2025

View reviewed changes

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py Outdated Show resolved Hide resolved

nagkumar91 and others added 9 commits September 9, 2025 09:13

Merge branch 'Azure:main' into main

21b71cf

Merge branch 'Azure:main' into main

823177a

Merge branch 'Azure:main' into main

48f630e

Merge branch 'Azure:main' into main

33cf2a2

Merge remote-tracking branch 'refs/remotes/origin/main'

9ab6f9d

merge: resolve conflicts in evaluation evaluators; align reasoning-mo…

ea3a38c

…del plumbing

evaluation(evaluate): revert flag renames and defensive default; keep…

72388cc

… evaluate_kwargs pop scope fix

Prefer SDK prompty for reasoning; PF wrapper; fix param sanitization;…

f403b99

… docstring; tests; add AZEVAL_USE_PROMPTFLOW override.

Merge branch 'Azure:main' into diff-20250811-171736

2dde27d

nagkumar91 requested a review from singankit September 23, 2025 17:46

Nagkumar Arkalgud and others added 4 commits September 23, 2025 14:57

prompty: remove AZEVAL_USE_LEGACY_PROMPTY, unify selection via kwargs…

4581b0e

…; improve reasoning error hints; add tests

Delete sdk/evaluation/azure-ai-evaluation/samples/aoai_score_model_gr…

a8869e5

…ader_order_debug_sample.py

Remove tracked .log files and samples directory

cea9be1

Restore samples directory from origin/main

ed1f9f7

nagkumar91 requested a review from Copilot September 24, 2025 15:15

Copilot AI reviewed Sep 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reasoning support for evaluators #42482

Reasoning support for evaluators #42482

nagkumar91 commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Uh oh!

	parameters = configs.get("model", {}).get("parameters") or {}
	parameters = configs.get("model", {}).get("parameters", {})

		_use_run_submitter_client = cast(Optional[bool], evaluate_kwargs.pop("_use_run_submitter_client", None))
		_use_pf_client = cast(Optional[bool], evaluate_kwargs.pop("_use_pf_client", None))

Reasoning support for evaluators #42482

Are you sure you want to change the base?

Reasoning support for evaluators #42482

Conversation

nagkumar91 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes:

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nagkumar91 commented Aug 12, 2025 •

edited

Loading